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ABSTRACT 

Two iterative methods are considered, Richardson’s method and a gen- 
eral second order method. For both methods, a variant of the method is 
derived for which only even numbered iterates are computed. The variant 
is called a leapfrog method. Comparisons between the conventional form 
of the methods and the leapfrog form are made under the assumption that 
the number of unknowns is large. In the case of Richardson’s method, it is 
possible to express the final iterate in terms of only the initial approxima- 
tion, a variant of the iteration called the grand-leap method. In the case of 
the grand-leap variant, a set of parameters is required. An algorithm is pre- 
sented to compute these parameters that is related to algorithms to compute 
the weights and abscissas for Gaussian quadrature. General algorithms to 
implement the leapfrog and grand-leap methods are presented. Algorithms 
for the important special case of the Chebyshev method are also given. 
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1. Introduction. 

The subject of this paper is a set of techniques to improve efficiency in the 
iterative solution of a real or complex linear system Ax=b, especially for the 
solution of large problems on supercomputers. 

An iterative method generates a sequence x^, ... . For the 

methods of this paper, a variant such that x^ can be expressed directly in terms 
of x (t ~ 2 ) with no dependence on x^’ -1 ^ will be called a leapfrog method. A variant 
of Richardson’s method is also presented for which the final iterate is computed 
from the intial approximation with no computation of intermediate iterates. This 
will be called the grand-leap method. The advantages of the leapfrog and 
grand-leap methods are: (i) a slight reduction in some cases in the total number 
of arithmetic operations; (ii) an increase in the number of terms in vector sums, 
an advantage on supercomputers that “chain”, i.e., transmit results from one 
arithmetic unit directly to another; and (iii) a reduction in I/O operations for 
large problems. In his Ph. D. thesis [Chro86] studied methods to omit 
intermediate successive iterates for the conjugate gradient method as a way to 
allow parallel computation of matrix vector products. His goals overlapped 
somewhat with those of this paper but the approach is not the same. 

Two iterative methods are considered: Richardson’s method [FoWa60, 
HaYo8l] and a general second order iterative method. For Richardson’s method 
the leapfrog method was used in [Smol81, SmSa85] as a technique to avoid 
complex arithmetic. In this paper its other properties are explored. 
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Richardson’s method is an old method the advantages of which have 
generally been ignored; however, see [AnGo72]. In the symmetric positive definite 
case, Richardson iteration parameters do not yield an optimum iterate at each 
step whereas a second order method does. This is one reason for the neglect of 
Richardson’s method. However, in a paper of Tal-Ezer [Tal87], a novel approach 
is described in which Richardson iterates are almost optimum at each step. 

The Chebyshev iteration is an example of a second order method [Mnt77, 
Mnt78], used for the solution of nonsymmetric systems. The Chebyshev iteration 
is not applicable, however, unless the eigenvalues of A lie in a half plane. 
Furthermore, the Manteuffel adaptive algorithm [Mnt78] assumes the eigenvalues 
appear in complex conjugate pairs, which holds if the matrix is real. This is a 
brief argument for the use of Richardson’s method if the matrix is either a 
general real nonsymmetric matrix with eigenvalues in both half planes or is a 
complex matrix the eigenvalues of which do not appear in complex conjugate 
pairs. It should be stressed that large complex matrices arise in signal processing, 
and constitute an important class of problems. 

1.1. Summary. In §2, the leapfrog version of Richardson’s method is derived. 
Iteration parameters are assumed given, with the exception of the Chebyshev 
case for which explicit formulas are given as well as an algorithm. With properly 
chosen parameters, the method applies to any real or complex matrix. 
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In §3, the grand-leap method is presented for computing the final 
Richardson’s method iterate in terms of the initial iterate. An algorithm is also 
given. 

Comparisons among the conventional, leapfrog and grand-leap versions of 
Richardson’s method are made in §4. 

In §5, the general formula for a second order method is stated, and a leapfrog 
version derived. An algorithm (Algorithm 3) is stated in which the parameters are 
assumed given. Algorithms for these parameters are presented in §7. 

Optimum I^-iteration parameters are defined in the Chebyshev case in §2. 
In §6, L 2 -optimum parameters are „ defined. Optimum L 2 -Richardson’s 
parameters, in the case of real eigenvalues, are the roots of an orthogonal 
polynomial. An algorithm to compute roots of orthogonal polynomials is 
developed, which is an implementation of the Stieltjes algorithm [Gaut82 ] and 
related to an algorithm presented in [GoWe69] for the weights and nodes for 
Gaussian quadrature. This algorithm is modified to yield the quantities required 
to execute the grand-leap method. The Z, 2 -approach is only one approach to 
optimum parameters. A non-L 2 treatment is given in [ElSt85, Tal87]. Each of 
these references is more general than the L 2 -methods in §6. A completely general 
L 2 -approach may be based on [SaSm88] but is beyond the scope of this paper. 

Algorithms based on the methods in §6 are gathered and presented in' §7. 
Algorithms for the special and important Chebyshev case are also given in §7. 
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1.2. Conventions and Notation. Although an / 2 -inner product is a special 
case of the L 2 -inner product if the measure is chosen correctly, for clarity and 
convenience, the two are used separately. 

The solution of a linear set of equations Ax = b generally requires that the 
set be preconditioned by transforming it into a set such as, for example, 
CAx = Cb for which the iterative method converges more rapidly. (Other 
preconditionings yield systems such as CAQQ _1 x = Cb, but the same remarks 
hold for these other cases.) There is no change in the techniques or algorithms 
presented in this paper if they are applied to CAx = Cb rather than Ax = b 
other than the change in the matrix from A to C A. It will therefore be assumed 
that A is the preconditioned matrix. 

It will be convenient from time to time to state that an algorithm converges 
with no restriction on the input data and if certain conditions on the eigenvalues 
are met. In a practical sense, of course, there are restrictions on the data such as 
those needed to prevent overflow, which may be infeasible to analyze and 
formulate. 

Matrices and vectors are denoted by boldface type. 

The number of unknowns is denoted by N. 
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2. Richardson’s Method: Conventional Form and Leapfrog Form. 

This section begins with a statement of Richardson’s method, from which the 
leapfrog form is easily derived. The Chebyshev case is outlined and an algorithm 
given. 


2.1. Conventional Richardson’s Method. Let r 0 , ..., r k _ 1 be a cycle of 
iteration parameters where k is called the period. The purpose of iteration 
parameters is to reduce the error, but discussion of this is postponed until later. 
For now, attention is directed to the iteration, and the reader is asked to accept 
the parameters as given. 

Let x^ be an initial guess. Richardson’s method is defined as follows. For 

x — 1 , ».., k , 


x(‘) =x(* -1 ) +r 1 -_ 1 r(* -1 ) . (2.1.1) 


2.2. Leapfrog Form. The recursion 

r(*) =r^ ,_1 ) — r t _ 1 Ar^* -1 ^ (2.2.1) 

will be used, which may be derived by first subtracting (2.1.1) from x=x to 


obtain 


0 


e(*) =e (‘- 1 )-r t _ 1 r^'- 1 ) (2.2.2) 

where e^*^=x— x^, and then multiplying (2.2.2) by A. Since r^=Ae^, (2.2.1) 
follows. Vectors and r^ are called the (true) error and the residual error 
respectively. 

The leapfrog step from x^* -2 ^ to x^ results from using (2.2.1) in (2.1.1) to 
give [SmSa85], for i = 2, 4, ..., k (under the assumption that k is even) 

=x^ -2 ) +r,-_ 2 r(* -2 ) +r ,-_ 1 |l — r, _ 2 Aj r^ -2 ) 
-x('" 2 ) +(t,_, +r,. 2 )r(- 2 ) . 


2.3. Optimum Chebyshev Parameters. It is easy to show that the error 
and residual error satisfy 


e<‘>=.R t (A)e< 0 > , 

r (t) =.R*(A)r (0) (2.3.1) 

where R k is the polynomial 

k 

R kis) = n(i - t-,-^). 

: =1 


Any polynomial such that i? A (0) = l is called a fesidual polynomial [Stie58]. 
Parameters are chosen to minimize R k (A) in some sense, to be discussed next. 
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Let fi be a set containing the spectrum of A. Usually one thinks of fi as an 
interval or union of intervals on the real line. However if A is nonsymmetric, 
then both fi and the spectrum may lie off the real axis in the complex plane. Two 
commonly used methods to minimize -Rfc(A) are either to minimize the L^-norm 
of polynomial #*.(<;) over fi or to minimize a weighted L 2 -norm over fi. In this 
part, only the L^-norm will be discussed. How Chebyshev polynomials are used 
to minimise this norm will now be outlined. The papers of Manteuffel give more 
details [Mnt77, Mnt78]. 

The Chebyshev residual polynomial is defined by 



where T k is the Chebyshev polynomial of degree k, and d and c are parameters 
defining a confocal family of ellipses: d is the center and the foci are d ± c . 
Henceforth, in the Chebyshev case, the set 1? containing the nonzero spectrum 
will be an ellipse. (An ellipse, as the term is used here, means the union of the 
curve and its interior.) Parameter c is assumed either real or purely imaginary; 
in either case, c 2 is real. (If c =0, -Rjfc(f) reduces to (f — d) k jd k .) Assume that d 
is real and one member of the confocal family with center d and foci d ± c is in 
the interior of the right half plane, i. e., d > 0. (If there is one in the left half 
plane, then we consider —A instead of A.) These assumptions mean respectively 
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that the major axis of each ellipse of the family is either on the real line or that 
the major axis of each ellipse is perpendicular to the real line. If the major axis is 
on the real line, then the assumption that at least one member of the family lie in 
the interior of the right half plane means that d \ c | >0. Finally, note that if 
the eigenvalues are real, then ellipse Q may be assumed to be the interval 
[d — | c | , d + | c | ], which is the degenerate ellipse of the confocal family. 

Among all residual polynomials, it may be shown [Mnt78] that the 
Chebyshev residual polynomial has the minimum L^-norm over the interval 
Q — [d— | c | , d+ 1 c| ], and closely approximates the residual polynomial with 
minimum Z/^-norm over any ellipse, J?, with center d and foci d ± c . 

It is not necessary that d and c 2 be real in order that the Chebyshev 
iteration converge. The reason for assuming above that these are real quantities 
is connected with the Manteuffel algorithm [Mnt78]. The Manteuffel algorithm is 
valid only when the eigenvalues of A appear in complex conjugate pairs, and it is 
this that leads to the assumption that d and c 2 are real. In general for any d and 
c 2 , convergence results if there is one ellipse with center d and foci d ± c 
containing the eigenvalues that does not also contain the origin. As a practical 
matter, however, the Chebyshev iteration is useful only when d and c can be 
obtained by some technique such as the Manteuffel algorithm. 

In the Chebyshev case the Richardson iteration parameters are derived from 
the roots, {p i }, of T k which are 
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Pi =cos 


7T- 


1+2* 


2 k 


=0, k— 1. 


The roots of R k are therefore d + cp t -, for * =0, A:— 1 and the parameters for 

Richardson’s method are 


1 

T i = — r ■ 

c pi + d 


2.4. An Algorithm in the Chebyshev Case. First a technical note on 

avoiding complex arithmetic: If the major axis is vertical, i.e., if c is pure 

imaginary, then the roots of R k occur in conjugate pairs. It is an advantage to 
order the parameters {r, } in such a way that, in this case, r,_! and r,_ 2 are 
conjugate pairs (in order that r,_ 1 + rj_ 2 and r,_ 1 r, _ 2 , required by the algorithm, 
are real), a task equivalent to ordering the roots, {/>,• }, of T k in such a way that 
Pi -2 = ~Pi-v 

Let h = 7t/2 k. The roots of T k are the cosine’s of 

#1 = h , d 2 = 7r — h, 0 3 = Zh, O 4 = 7T — 3h, ... . If k is even, which it is in the 

7T 

leapfrog case, the last two roots in this ordering are the cosine’s of 0 fc-1 = h 

2 

7T 

and 0 k = 1- h. Moreover, cos0j = — cos0 2 , etc. In the algorithm, the formula 


0 , = 


2 


» + 1 
2 


- 1 


(— 1)* l h + 7rmod(2 +1, 2) 
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for i — 1, k, will be used. If p, = cos^ , then in the cjse when c is pure 
imaginary, {d + cp v d + cp 2 }, {d + cp 3 , d -f cp 4 }, ... is a sequence of conjugate 
pairs. 

If the major axis is real, the error is reduced after a cycle of exactly k 
parameters, but the algorithm may not have converged. If the major axis is 
vertical, the error is reduced only if k is sufficiently large, a requirement that in 
practical applications, however, is observed to be reasonable. If the algorithm has 
not converged, the cycle of parameters is repeated. After the parameters have 
been recycled l times, the error, = x — x^ kl \ satisfies = jf? fc (A)j e^ 0 ^. 

But Ru{<:) ¥= where R ki is the optimum Chebyshev residual polynomial 

of degree kl. This is a basic problem with Richardson’s method: It is optimum 
only at the end of one cycle of parameters. For an alternative approach, not 
based on Chebyshev parameters, see [Tal87] in which a method is proposed to 
increase the number of parameters in an almost optimum way (thus the period is 
not fixed) until convergence is achieved. 

Algorithm 1. (Leapfrog Richardson’s method in the Chebyshev case.) 

Purpose . Execute Richardson’s method with Chebyshev parameters and omit 
alternate steps. 

Input. Matrix A, right side b, initial guess x^, cycle k, and ellipse 
parameters d and c. The ellipse parameters are assumed known, for example, as 
output from the Manteuffel algorithm [Mnt78, Ashb85]. The user must also 
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provide a maximum number of cycles of iterations and an error criterion to halt 
the iteration. 

Output. Iterate x^, the iterate reached after the last cycle of k parameters 
in the standard execution of Richardson’s method. 

Restrictions. If d and c 2 are real, then d is assumed either positive or 
negative and will be assumed positive without loss of generality. 
Also, for real c, 0 < d — | c | . In general for any d and c 2 , convergence results if 
there is one ellipse with center d and foci d ± c containing the eigenvalues that 
does not also contain the origin. If the matrix is singular, the algorithm 
converges to a solution if the system is consistent. Period k is even. 

Notes, (l) Quantities 0 i} p, , and r t need not be array variables since only 
three values are used during execution, but subscripts make the algorithm more 
convenient to state. (2) A slight modification of the algorithm would allow {r^ } to 
be an input array, for example, from Algorithm 4. 


1) Set h := . 

2k 

2) Do either until convergence or a limit on the number of loops is exceeded. 

2.1) For i = 2 to k by 2 do: 


2.1.1) Set 0 t _ x 



2.1.2) Set 


2 


» + 1 
2 


1 1 ( — 1)* 2 h +7rmod(t,2). 

(— 1)’ ~ l h + 7rmod(f + 1, 2). 
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2.1.3) Set p t _ 1 := cos0 i _ 1 . 

2.1.4) Set /),• := cos 0, . 


2.1.5) Set r t _ 2 


1 

d + cpi_ x 


2.1.6) Set r, , := . 

’ ‘ 1 d + cp,- 


2.1.7) Set a := r ,_ 2 + r,.! . 

2.1.8) Set i/ := r,_ 2 r t _ 1 . 

2.1.9) Set r^* -2 ) := b — Ax^* -2 ^ . 

2.1.10) Set t := Ar(* -2) . 


2.1.11) Set x ( ‘) := x (, -2) + ar ( ‘ _2) - ut . 


2.1.12) Endfor. 


2.2) If not converged, set x^ := x^ . 


2.3) Enddo. 
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3. Richardson’s Method in One Step. 

It is easy to see that leapfrog could be continued further to allow computing x^ 
from x (t_4) . Ultimately, one arrives at an expression for x^ in terms of x^ with 
no intermediate approximations, the form of which is, as will be seen 
momentarily, 


x (fc) =x (o) +Cfc _ i ( A ) r (o) ^ (3.1) 

where Cj-.j is a polynomial of degree k— 1. Computing x^ only from x^ while 
omitting the computation of any intermediate approximation will be called the 
grand-leap. 


3.1. Krylov Subspace Methods. Richardson’s method is an example of a 
Krylov subspace method, i. e., for 1 < t, 

x (») _ x (°) ev t , (3.1.1) 

where V , • is the Krylov subspace defined by 


V i = span 


r<°» A-'rW 


The proof of (3.1.1) is an easy induction based on x^=x^ 1 ^+r,_ 1 r^ 1 1. 
Membership relation (3.1.1) implies (3.1). 
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3.2. An Expression for C jt _ 1 . Multiply 


x (*) _ x (°) =x^ —x +x — x^ =C' fc _ 1 (A)r l 


by A to obtain 


r (o) _ r (A=) =A c' Jt _ 1 ( A ) r (°>. 

Since 

rW=R k { A)r(°) , 

it follows that 




1 — Rk (f) 

£ 


which is a polynomial since i2 fc (0) = l. 


Therefore if 


R k[$)=0kf + * ’ * +01C+ 1 


then 


c k-i{d =& k^ *+ * * * +0i- 


The representation of any polynomial in f in terms of powers of f i 


( 3 . 2 . 1 ) 
called the 


power form. 
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3.3. A Remark on Polynomial Preconditioning. Assume that residual 
polynomial R k is small on set J? (containing the spectrum of A.) Therefore, on /?, 
is an approximation to f -1 , and C k _ l (A) is an approximation to A -1 . 
Polynomial C k _ x arises in so-called polynomial preconditioning [Adm82, AMS87, 
Chen82, JMP83, Sayl83, Tal87]. 


3.4. Methods to Compute C jfe _ 1 (A)r^. In the important Chebyshev case, 


**(?)- 



The coefficients, 0, , could be easily determined by expanding T k 


d -f 

c 


in terms 


of powers of f. In principle, the coefficients of any residual polynomial could be 
determined in the same way, although no residual polynomial is as well 
documented as the Chebyshev case. 


Despite the simplicity of this approach, it has the unfavorable feature that 
even when the coefficients, 0, , are known explicitly, it is numerically difficult to 
compute the vector d =0 k A k ~ 1 r^ + • • • +0^°^ due to the ill conditioning of the 
basis (r^, • • • ,A*~M 0 )}, if k is large. However, to avoid instability it is often 
sufficient to take a small value of k, say k =5. If the power form coefficients are 
known then nested* polynomial evaluation (Horner’s rule) could be used to 
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compute C k _ 1 (A.)r(°K 

An algorithm is presented later, Algorithm 2, in which the roots, 
{<7 t - : i =1, ..., k — l}, and the leading coefficient, g^-x, of Cfc-i will be assumed 
given. The roots of C k _ x may be computed from the power form (for example by 
computing the eigenvalues of the companion matrix.) A method and algorithm 
(Algorithm 5) are presented in §6 that do not require the power form coefficients. 
Also see [Tal87] for a non-L 2 approach. 

3.4.1. Avoiding Complex Arithmetic. If A is real, then it is reasonable to 
assume that the coefficients of C k _ x are real. If so, then the roots of C k _ x occur 
in complex conjugate pairs. Let a and a be a conjugate pair. Since 

(A— <r)(A— o)u = |a 2 — (a+o)A + 1 a\ 2 ju , 

no complex arithmetic is required to evaluate C' fc _ 1 (A)r^ 0 ^ when the factored form 
is used. (In the general case when A is complex, the roots do not occur in 
conjugate pairs.) 

3.5. Algorithm for the Grand— Leap. 

Algorithm 2. (Compute x^ =x^ + C fc _ 1 (A)r^ 0 ^. / ) 
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Purpose. Compute the final iterate from the initial guess with no 

intermediate iterates computed, except those at the end of each cycle. 

Input. Matrix A, right side b, the intitial guess x^, period k, the leading 

k - 1 

coefficient, g k -\ and the roots cr 1 , ..., <r Jfc _ 1 of C k _ v i.e., C'jfe-i(c) = g k -\ n (f — <7j). 

Parameter t q , which is the reciprocal of the root of i? 1 , is required if k = 1. These 
parameters are generated from Algorithm 5,. and, in the Chebyshev case, from 
Algorithm 6. However, there are other sources for the parameters such as [Tal87j. 
The user must also provide a maximum number of cycles of iterations, and an 
error criterion to halt the iteration. 

Output. Iterate x^, the last iterate reached after a cycle of k parameters in 
the standard execution of Richardson’s method. 

Restrictions. The algorithm executes with no restrictions on the input data. 
However in order for the algorithm to converge to the solution of Ax = b, for all 
b, it is necessary that | R k (\)\ =| 1 — X, C k _ 1 (\ )| <1 for each nonzero 

eigenvalue, \ , of A. If this holds and A is singular, the algorithm converges to a 
solution when the system is consistent. 


1) Set r (0) :=b-Ax (0) . 

2) If A; =1 then 


2.1) Set xW—x^+rjr^. 
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2.2) Return. 

3) Separate the roots {a, } of C k _ x into real roots and conjugate pairs of (nonreal) 

roots: o x , ..., o m are real; <r m+1 , •••> °k - 1 are nonreal and 

<7y + i =Oj,m +1<J <k —2. 

4) Do until convergence or a limit on the number of loops is exceeded: 

4.1) Set :=b — Ax^ . 

4.2) Set 

A: — 1 r / \ l m 

x ( k ) := X i*)+g k _ x ^ n +i |A [a - (a j +<r j+1 ) j + <Tj(Tj + i j X n^A.-<?j)rW . 

4.3) If not converged set 

Enddo. 


! 
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4. Comparisons. 

To display some of the advantages in the leapfrog and grand-leap approach a 
side-by-side comparison of algorithms is made in Table 1. The conventional 
Richardson’s method is compared to the leapfrog version, in which alternate steps 
are omitted, and to the grand-leap version in which all intermediate steps have 
been omitted. The period, k , is assumed even. 

The operations shown in Table 1 form the kernel of a loop, the commands 
for which have been omitted. The details for a complete algorithm have been 
given already and would be distracting here. 

On any computer, reducing the number of arithmetic operations, the 
traditional goal of algorithm design, is an advantage. In the case of the leapfrog 
and grand-leap versions, it is a thin advantage but an advantage nevertheless and 
one that is unexpected. (Since Richardson’s method is a Krylov subspace method, 
the number of matrix- vector multiplications cannot be reduced.) It is a further 
advantage on supercomputers that do chaining that there are more terms in the 
leapfrog expression for x^* ^ than in the conventional expression. 

Some additional comment is needed on how arithmetic operations are 
counted. The number of arithmetic operations given in the table is based on the 
assumption that there is no mixed real and complex arithmetic. Let us consider 
when mixed arithmetic occurs. The Table 1 parameters are 
{r t - , cr, , Tj + r t , r,- r t - , a,- + c, , er, a,- }. In the Hermitian symmetric positive definite 
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case, the roots of R k are real, the Table 1 parameters are real, and there is no 
mixed arithmetic. (It is reasonable to assume this. A Hermitian positive definite 
system could be solved with nonreal parameters, however.) If A is a general 
complex nonsymmetric matrix, all Table 1 parameters are general complex 
quantities and since the matrix is complex, there is again no mixed arithmetic. 

Mixed arithmetic occurs in Richardson’s method if A is real and 
nonsymmetric, for then the Table 1 parameters are general, complex quantitie ... 
whereas the other quantities are real. If A is real, it is reasonable to assume that 
polynomials R k and C fc _ 1 are real. The roots may then be grouped in conjugate 
pairs, and the leapfrog and grand-leap methods performed in real arithmetic. 
Richardson’s method, however, requires complex arithmetic, and the number of 
arithmetic operations is effectively larger than shown in Table 1. In this case, one 
would not want to consider Richardson’s method, which was the motive for using 
the leapfrog method in [SmSa85]. 

Now we come to an aspect of these comparisons, namely the effect on I/O 
due to the solution of large systems, that is important to take into account but is 
necessarily limited due to the range of the subject. 

The limitation made here is to consider only programmer-controlled storage, 
from among a list of topics required for a more complete discussion that includes 
architectures, specific application problems, and implementation details. The 
reader may object that although it is reasonable to dismiss architectures, it is still 
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not reasonable to restrict discussion in quite this way. For, the typical user is 
running problems on a virtual memory machine and is beset with multiple 
worries that deserve attention, such as memory “touches”, or the loading of 
vector registers, or the losses due to flushing a cache. Unfortunately, such 
transfers between memory levels are hardware dependent and simply cannot be 
analyzed within the scope of this paper; the conclusions reached here below do 
not necessarily hold in these cases. It should be noted, however, that even for 
virtual memory systems, there exist limits [Eccl83] that compel the use of explicit 
I/O commands similar to those in Table 1. 

One final comment to justify the narrow focus that is taken: It is 

characteristic of many supercomputers that only programmer-controlled 
peripheral storage is available for large problems and when needed is usually 
responsible for languid performance. This dismal fact often attracts comment. 
For example, Ortega and Voigt observe, “The [programmer-controlled] I/O 
problem produced by very large problems [is]... known to be potentially 
devastating on high performance systems ... .” [OrVo85]. Programmer-controlled 
storage includes system commands, custom utilities, and the less efficient choice, 
depending on circumstances, of Fortran commands. Only Fortran commands are 
given in Table 1. 

In order to weigh the effect of transfers from peripheral storage, a large set^of 
linear equations is assumed. This vague statement will be sharpened in order to 
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arrive at a rather specific assumption. The discretization of coupled partial 
differential equations in three dimensions yields, in some applications, leviathan 
systems of order ten million complex unknowns. Such problems lead to the 
assumption that a matrix multiplication, which may involve a preconditioning, 
absorbs the primary memory and that processing after a matrix multiplication 
requires reading in a vector from disk. This assumption is seen in Table 1 when, 
for example, in the leapfrog algorithm, ” 2 ^ must be read from disk after 
computing t = Ar^ ~ 2 ). 

Under these conditions, a third advantage of the leapfrog and grand-leap 
algorithms is seen: there are fewer READ’S and WRITE’s. 

In an actual implementation, it may well happen, for example, that two 
matrix READ’S are not necessary in any algorithm and that in the 

conventional algorithm need not be written on disk. Conditions will vary, and 
the results in the table are only representative. If the assumption on matrix 
vector multiplications is not valid, the comparisons would change, but it is 
plausible that for large problems there would remain an I/O advantage to the 
leapfrog and the grand-leap formulations. 
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Table 1 


Conventional 

Leapfrog 

Grand-Leap 


\ 

+ 

<M 

1 

II 

a=cr+o 


i/ = r,_ 2 r t _ 1 

V = 0<7 

READ A 
v=Ax (!-2) 

READ A 
v=Ax(* -2 ) 

READ A 
v x =Au^’ -2 ) -ftu 

READ b 
r «-’)=b-v 

READ b 

r('- 2 )=b— v 

READ A 
v 2 =A Vl 

READ 

x («-l) =x («-2) +r ._ 2r (t-2) 

WRITE 

READ A 
t=Ar< i - I > 


READ A 
v=Ax (,-1) 



READ b 
r (*'- 1 )=b-v 

READ r(* _2 ) 


READ x (,_1) 

x (0 +r, _ 1 r^ -1 ^ 

READ x^-?) 

x (* ) =x (* ~ 2 ) _j_ ar (* - 2 ) 

READ u(*'" 2) 

U^ , ^=V 2 +l / U^‘" 2 ^ 

WRITE x (,) . 

— z/t 

WRITE x^ 

WRITE u ( ‘) 

2 matrix mults. 

2 matrix mults. 

2 matrix mults. 

4 vector READ’s 

3 vector READ’s 

1 vector READ 

2 matrix READ’s 

2 matrix READ’s 

2 matrix READ’s 

2 vector WRITE’s 

1 vector WRITE 

1 vector WRITE 

4N adds 

3N adds 

2N adds 

2N mults. 

2N mults. 

2N mults. 
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5. Second Order Iterations. 

In the real eigenvalue case, residual polynomials of practical value are orthogonal 
polynomials, and satisfy a three term recursion. This elegant property yields 
second order methods, which have an extra term in the expression for the new 
iterate as compared to Richardson’s method. Richardson’s method is also called a 
first order method and a second order method sometimes called Richardson’s 
second order method. In a second order method, each new iterate is optimum in 
the sense that the residual polynomial satisfies an I/ 2 -optimality property, to be 
be discussed in §6. The Chebyshev iteration, employed by Manteuffel [Mnt77], is 
an example. In the case of a first order method, is optimum if the residual 
polynomial, R k , is optimum, but x^ is not optimum for i =£k, a fact commented 
on previously in the Notes for Algorithm 1. There is a cost in the second order 
method for optimality at each step: a larger number of arithmetic operations and 
a greater use of storage compared to Richardson’s method. 

The objective is a second order method for which only even numbered 
iterates are computed using information only at even numbered steps, a method 
hereafter called a second order leapfrog method. In the Chebyshev case, an 
algorithm will be given. 
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5.1. The Second Order Iteration. Some preliminaries are needed. Let 
be the given initial guess. Define Ax^ -1 ) to be zero and for 0 <k, 

The second order iteration requires a set of parameters { a % , % : 1 < k} 
that are given explicitly in the Chebyshev case in Algorithm 3, and derived in a 
general way in Algorithm 4. Assume these parameters are given. The iteration 
may now be stated. Let r^ = b — Ax^. For k >1, 

Ax^ -1 ) =7 t 4x^ _2 4a fc r(* _1 \ (5.1.1) 

x (*) =x (*-i) + 4 x (*-i) f 

and 

r^=b— A*W. 

5.2. Second Order Leapfrog. The derivation is somewhat lengthier than in 
the case of Richardson’s method due to: the need to express Ax^ in terms of 
information at step k —2; and a complication involving the residual vector. 

First, an expression for x^, k >2, is obtained in terms of information at 


step k — 2. Since 
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it follows that 


[ ( t ) = x (‘- J ) + 4 x (‘- 2 ) + Ax (‘-0 


Now use (5.1.1) in the last equation to obtain 


Define 


Then 


c W^-2) +Ax (k-2) + Ax (k-2) + (k-l) 


Ar (k-2) =r (k-l) _ r (k-2) 


c (*> =x^ k ~ 2) +Ax {k ~ 2] + lk Ax^ + a k [r^ + Ar^ 


It remains to express and Ax^ in terms of information at step k —2. 

The expression for Z\r^ is simply 


At^ = -AAx {k) 


Finally, to obtain Ax^ k \ 


AxW = oc k+l r W +i k+1 Ax( k ' 


= a fc+i r ^+'7jfc+i [“it [ r ^ +Ar^ k 
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To summarize, the formulas to go from step k — 2 to step k are 

x (*) =x (*— 2)_|_ | r (*-2) + £ T [k - 2 ) j ^ A x ( k ~ 2 ) j 

+ Ax .( k ~ 2 ' 1 , 

= b— Ax^ , 

4x<‘> = [o t (r(‘- ! >+4rl l - 2 »]+ 1l 4xt i - 2 )], 

= — AAx^ . 

Initially, 

x (°) = given , 
r(°) = b-Ax(°) 

Ax* 0 ) = a ir (°), 


Ar*°);=-AAx (0) . 
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Table 2 

Conventional Second Order 

Leapfrog Second Order 


READ Ax^ k ~ 2 ^ 
READ r (fc-2) 

READ x (fc_2) 

READx (fc-2) 

^ x (A:-2)_^^_ i ^ x ( fc-3 )-|- Q ; jt _ 1 r( A-2 ) 

w = a k jr^ -2 ) + Ar^ -2 ) j + 'y k Ax^ k ~ 2 ^ 

WRITE Ax^^ 

WRITE w 

x (*- 1 ) =x (* -2 )_(./\x^ -2 ^ 

x (*) = x^ -2 ^ + w + Ax^ k ~ 2 ^ 

WRITE x (fc_l) 

WRITE x (fe) 

READ A 

READ A 

v=Ax^ -1 ) 

v=Ax^' 

READ b 

READ b 

r (*-i)=b-v 

r (*)= b— ' V 

READ Ax^ k ~ 2 ^ 

READ w 

Ax^~ 2) +a k r < ^ k ~ 1) 
WRITE Ax ( * -1) 

Ax^ = ak+ir {k) +l k+ jw 
WRITE Ax^ 

READx ( * -1) 
x (*) = x^ -1 ^ + 
WRITE x (A) 

! 

WRITE 

READ A 

READ A 

v = Ax^ 

= — AAx^ 

READ b 

= b — v 


2 matrix mults. 

2 matrix mults. 

6 vector READ’s 

5 vector READ’s 

2 matrix READ’s 

2 matrix READ’s 

4 vector WRITE’s 

4 vector WRITE’s 

6N adds 

6N adds 

4N mults. 

4N mults. 



i 
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5.3. Comparisons. Under the same assumptions as for the previous set of 
comparisons, the two versions of the second order iteration are compared in Table 
2. There are fewer advantages of the leapfrog algorithm in this case since the 
number of arithmetic operations and the number of WRITE’s is the same. The 
advantages are that there are fewer READ’s and a greater number of terms in the 
sum defining x^) in the leapfrog version. Note that variations are possible, for 
example, in recomputing w in the leapfrog version, and that the arrangement of 
terms used here is not necessarily suitable for a particular problem or 
architecture. 

5.4. Algorithm for the Second Order Leapfrog Method in the 
Chebyshev Case. For the convenience of the reader, an algorithm is given 
below for the case of Chebyshev parameters. As before with Algorithms 1 and 2, 
no attempt is made to incorporate I/O statements. 

Algorithm 3. ( Second order leapfrog iteration with Chebyshev parameters.) 

Purpose. Execute the leapfrog form of the second order iteration for the 
Chebyshev case. The parameters are the same as for the standard second order 
Chebyshev iteration as used for example in the Manteuffel algorithm [Mnt78, 
Ashb85]. 

Input. Matrix A, right side b, and initial guess x^; also a pair d and c such 
that d is the center and d ± c are the foci of a family of ellipses over which the 


r 
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Chebyshev residual polynomial is (nearly) minimum with respect to the uniform 
norm. The ellipse parameters are assumed known, for example, as output from 
the Manteuffel algorithm [Ashb85|. In the general non-Chebyshev case, this 
algorithm could be easily modified to allow {a* } and {7 k } to be input parameters, 
say, from Algorithm 4. 

Output. The algorithm generates a set of optimum iterates converging to the 
solution of Ax = b if the restrictions are satisfied. 

Restrictions. The restrictions are the same as for Algorithm 1. 

Notes. The a k and 'y k parameters need not be array variables; the subscripts 
aid clarity. 

1) Set := b — Ax^. 

2) Set 07 := l/d. 

3) Set Ax^ := a 2 r^. 

4) Set := — AAx^. 

5) SetQ2;= _^_. 

6) Set 7 2 — da 2 — 1. 

7) Do k = 2 by 2 until either convergence or a limit is exceeded: 
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7.1) Set w := a k jr^* 2 ^ + Ar^ k 2 ) j + 7*. 2 \ 

7.2) Set x (<;) := x (fc-2) + w + Zlx (fc_2 l 


7.3) Set r (fc) := b - Ax (fc) . 


7.4) Set a fc+1 := 


d - 


1 



7.5) Set 7 * +1 := da k+l - 1. 


7.6) Set oj^2 • — 



7.7) Set 7jb+2 •“ ^ a ife+2 


7.8) Set 4x (i) := ajfc +1 r (fc) + 7*+i w - 


Enddo 
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6. L 2 -Optimum Parameters. 

If either the L 2 - or / 2 -norm is used to define optimum residual polynomials, then 
it turns out that optimum residual polynomials form a family of orthogonal 
polynomials if the inner product (either integral or sum) is defined over a real set. 
From this fact, algorithms follow for the computation of the r-parameters for 
Richardson’s method, the ^-parameters for the grand-leap method, and the 
parameters for the second order method, which are presented in this section. The 
assumption that the inner product is defined over a real set usually means that 
the eigenvalues of the system matrix are real. The Chebyshev case is an 
exception for which the eigenvalues need not be real. (Since Chebyshev 
polynomials form an orthogonal family, Chebyshev residual polynomials are L 2 - 
optimum as well as L ^-optimum.) 

The algorithms ' in this section generalize to the case of an inner product 
defined over a contour in the complex plane. (A generalization may be based on 
[SaSm88].) 

6.1. L 2 -Optimality. Some notation is necessary. Let F be an interval or a 
union of intervals on the real line (generally, F could be any measurable subset) 
and let w be a positive weight function on F. Define 

if,9)w =7 -J7(£M£M£) d £ 

L r 
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where L = fw(£)d£. The set r may be assumed to be real by a linear change of 

r 

variables if necessary. In practice, rather than the continuous inner product, one 
would use a discrete inner product of the form 

i M 

if ,g)w = f 

where m(f) is a measure, such as | . A norm is defined by 

II f\\i=(f, f) w . 

An (L 2 -) optimum residual polynomial of degree k is defined to be that residual 
polynomial, R k , with the smallest norm, 

ll 

where P k is any residual polynomial of degree k. Ideally, r should contain the spectrum 
of A, and conform to the spectrum as closely as possible. Thus if the spectrum were 
contained in the union of two intervals, r should also be the union of, if possible, the 
same two intervals. How to find the interval or union of intervals containing the 
spectrum is a difficult problem, and is not considered here; the reader is referred to the 
Manteuffel algorithm [Mnt78] (which, however, computes only one interval containing 
the spectrum.) If {i? t } is a set of optimum residual polynomials, then [Stie58] they form 
an orthogonal set with respect to the modified weight function, 


(Ri,Rj)irv =0 


( 6 . 1 . 1 ) 
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if and only if i 


8.2. The Recursive Property of Orthogonal Polynomials. The well-known 
three term recursion for orthogonal polynomials is recalled, a property that yields not 
only a second order iteration, which is derived here, but, also in §7.1, an algorithm for 
computing, among other things, the roots of the optimum residual polynomial, needed in 
order to execute Richardson’s method. 

Define <j>_ x to be zero, and let (j> 0 be a nonzero constant. A family, {<j> k :0<A;}, of 
orthogonal polynomials satisfies a three term recursion: for 1 < k , 

4MH<*kt+0k)<t>k-i{d-ikh-2{t ) . ( 6 . 2 . 1 ) 

where a k ,/3 k ,' f k are recursion coefficients given by 

Pjc ~ (frftfc— l)u, 

" « h-i\\ l ’ 

Ik _ U^k-V^k-2)w 

a k II ^- 2 II l 

One coefficient is a parameter that' allows a normalization, such as, for example, 
|| <^11 w =l. If {<f> k : 0 <fc} is a family of residual polynomials, the desired 
normalization is <j) k (0) = l, which yields /?jt = l k + 1 and, with R k {£) = <£*.(£), 
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«*(£)=(<**£ +1* +i)R t -i(f)-'r»«t-s(a • <“- 2 - s > 

6.3. Second Order Iteration. The recursion for the residual polynomials yields an 
iteration for which x ^ is L 2 -optimum in the following sense: The error, e^=x— x^, 
satisfies =R k ( A)e^, where R k is an L 2 -optimum residual polynomial. 

To derive the iteration, replace £ with A in (6.2.2) and multiply on the right by r'°^ 
to get[Stie58], for 1 < k and r^ -1 ^ defined to be zero, 

r^=(l +i k )r( k ~ 1 ' > +a k ArV e ~ 1 ' ) -7 fc r (A_2) . 

Replace r^ by b— j =k —2,k—l,k, and multiply on the left by A -1 to obtain 

x( fc ) = (1 +-7i)x^“^ d-a^r^ -1 ^ —^ k x^ k ~ 2 \ 

Initially, x^ is given and r^=b — Ax^. 

The iteration is usually expressed in terms of the iterant difference, x^ — x^~ l \ as 
in (5.1.1). 


8.4. A Method for the Roots of C fc _ 1 . A matrix will be derived, the eigenvalues of 
which are the roots of C k _ v 

Recall from §6.1 that an optimum residual polynomial of degree k is defined to be 
that residual polynomial, R k , that solves the weighted least squares problem 
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r r 

where P k is any residual polynomial of degree k. Also if {R, } is a set of optimum 
residual polynomials, then they form an orthogonal set with respect to the modified 
weight function, £iv(£); see (6.1.1). 

The roots of orthogonal polynomials may be computed by a stable algorithm based 
on the fact that the roots are the eigenvalues of a symmetric tridiagonal matrix, S fe . 
The algorithm is called the Stieltjes algorithm and matrix S*. is called the Jacobi matrix. 
The Stieltjes algorithm is recommended for computing the optimum Richardson’s 
method parameters, which are the reciprocals of the roots of the optimum residual 
polynomial. Matrix Sj. may be modified by one element to obtain a matrix the nonzero 
eigenvalues of which are roots of C k _ x . 

If only the roots of R k and C k _ l are desired, it- is not important that R fc (0) = 1. It 
is preferable to work with the normalized family 


1*(0 


Ri(0 I 
II Ri\\ (w '• 


6.4.1. The Roots of Rj.(f). The elements of matrix S fc are the coefficients of the 
three term recursion satisfied by {<£,• }. 

It is convenient to write the recursion (6.2.1) in the slightly different form 
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£<f>k - 1 ( f) — ’ s k ,k - 1 ^Jfc -2 ( 0 + 5 JfcJfe <f>k - 1 ( 0 + s k , k + 1 ( £)• 
The first three terms of the recursion are 


MO = s nM0 + s i 2 <f>i (0 

€&l (0 = s 21&o(0 + 5 22^l(0 + s 23 ^ 2(0 

MO = 5 32^l(0 + 5 33^2(0 +«34^ 3 (0» 

which may be written in matrix form as 



Hi) 


s n s 12 0 

i’nii) 


0 

e 

Hi) 

— 

S 21 s 22 5 23 

Hi) 

+ 5 34^3(^) 

0 


Hi) 


0 5 32 5 33 

Hi) 


1 


In general, k terms yield the matrix-vector equation 


^{0 =s k^{0+ s k,k+i < t>k{0h i 

where 

MOHMO.Hi) h-,(()) T , 

S k = (0, . . . , 0, 1, 0, , 0) T is the k th unit vector and S k =(s ij -) is the tridiagonal 

Jacobi matrix [Wilf62, GoWe69]. The eigenvalues of the Jacobi matrix coincide with the 
roots of <f > k . 

Next, a nested procedure in which the eigenvalues of S l ,...,S k are successively 
computed will be described for computing the roots of <j> k . Let {p^ :1 < j <i } be the 
roots of d > i ; these are the eigenvalues of S, . 
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The procedure begins with the initial polynomial (f > Q . Since <j> 0 is a constant such 
that || <£ 0 || =1, it follows that 


MO 



Next, to compute the root of <fi lt it follows from 


£MO = - s llM0+ s 12&l(0> j 
that, if <j> l is to be orthogonal to (f> 0 , 

Si = ( s n) = ((f^o>^o)£tu) • 

Of course, ^i(sn)=0. 

Now assume S fc _j has been computed, 2 <k. Since it is the (k— l)X(k — 1) principal 
submatrix of S k , only the last row and column of S k need be computed, a total of three 
nonzero elements. Since the polynomials are normalized, S k may be proved to be 
symmetric. Hence only s k _ l k and s kk are required. 

Matrix S Jt _ 1 yields the roots p 1 ...,p k _ lk _ x of <f> k _ v Let 
n k-liO—{t~ P\,k-\) ' ’ ' [i~Pk- l,Jb— l)* 

Then 




J’k-iiO 
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The elements to be computed are s k k _ v and s^. These are unknowns in the relations 
(with, of course, s k _ l k = 

£^*-2(0 =s k-l,k-2^k-3U) + s /c-l,k-l4 > k-2(t) + s k-l,k <t>k- ltf) 

and 


€ < f > k-l(t) ==s k,k-l < t > k-2{€)+ s kk < f > k-l(€)+ s k,k+l < f > kU)’ (6.4.1) 

Orthogonality yields 


s k-l,k —(£<f>k- 2» ^ifc-l)cw (6.4.2) 

and 


s kk l)^tw- (6.4.3) 

This completes the computation of S k . An algorithm (Algorithm 4) is given in §7. 


6.4.2. The Roots of C^ —1 . We have 


tCk-M-l-RM. 


<M°) 


it follows that 


Since R k (€) 
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^Jt(0=^o(6[^ifc( 0 )/^o(0]-^jt( 0 ) (fCfc-iU)] ■ 

Equation (6.4.1) therefore gives 

t<t>k-i{£) = hi < l > o(Q+ s k,k-i < J > k-2{£)+ s kk < i > k-i{€) ~ s k,k+i<f>k{ Q ) (fOjfc-iU)] > (6.4.4) 

where Ski :=s k,k+i < t > k{Q)/ < t > o{0' (Of course, <£ 0 (f) is a constant.) Define a lower 
Hessenburg matrix S^=(5 tJ ) by setting s,y:=s,y unless i=k,j = 1, in which case s kl 
has been defined. 

The equation 

&U) = Sjfe £) + s k ,k +\<f ) k ( £)&k 

now becomes 

The roots of ^C7 fc —1 (^) are therefore the eigenvalues of S k . 

6.5. Leading Coefficient Preparations are nearly complete for the 

computation of 

x (*) =x (°)+C' jfe _ 1 (A)r( 0 ) . 

There is one remaining detail, an expression for the leading coefficient g k _ 1 such that 
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c k-\{£)=9k-i{£.-°\) • • • 


Recall that 




R k{£)= < t>k{t)/ < l>k (°)> 


<f>k(£) 


*k CQ 
7r k II v 


1 


and 


7r k($)=(£-PlJc) • • • it-Pkk)- 


Therefore, 


_ -1 

St ~ l 1**11 f M 0 ) ' 


In the Chebyshev case, an alternative formula is given in the next section. 
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7. Algorithms. 

In this section, algorithms are given for the root finding algorithms for the general / 2 - 
case and also for the Chebyshev case. 


7.1. Algorithm for Normalized Residual Polynomials. 

Algorithm 4 • (Compute the recursion coefficients of a specified orthogonal family, 
and the roots and leading coefficients of the degree k orthogonal polynomial <j> k of the 
family.) 

Purpose: Generate the factored form of successive normalized (real) 

orthogonal polynomials, <f> it i =0,...,k; and the residual polynomial recursion 
parameters, {<*£, '■y*.}. Additional output is described below. If polynomials are 
optimum with respect to a weight function w, they are then orthogonal with respect to 
the (real) weight function £u>(f), and this will be the weight function used below. Note 
that R k {£) = (£)/^fc (0) * 1S a residual polynomial. 


Input. A subprogram must be provided to compute an inner product 
(/ ,g) = —J f (£)g(€)€w(£)d£ where L=fd£. In practice, this subprogram would 


M 


compute a discrete inner product (/ ,g) = — — JJ f (f t )g(£i )f, )• Input to the 

M i-i 


subprogram would be the number, M , of nodes, the nodes , i =1, and the 


weight, u>(£,-) (an array or external function). 
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Input to Algorithm 4 then consists of input to the subprogram, the subprogram 
itself, and the degree, k , of the highest degree normalized orthogonal polynomial. 

Output. The algorithm generates (l) a two dimensional array of roots, {py,- }, of <j> i , 
0 < i < k, required for (a modification of) Algorithm 1 in the non-Chebyshev case; (2) 

parameter , needed for Algorithm 2 in the special case k =1; (3) the Jacobi 

Pn 

matrix S^; (4) <f> 0 , and <j> k { 0), needed for Algorithm 5; (5) the recursion coefficients 
{a k , for the residual polynomials, needed for (a modification of) Algorithm 3; and 
(6) the array of leading coefficients, ^ , of <j >± , needed for Algorithm 5. 

Restrictions. Degree k must satisfy k <M. The restriction on T is that the nodes, 
{£,}, lie on the real line, which holds if A is Hermitian symmetric positive definite. 
However, it is not necessary that A be Hermitian symmetric positive definite. For 
example, if A is real nonsymmetric then J 1 may be taken to be the major axis of an 
ellipse enclosing the nonzero spectrum and the inner product taken to be the inner 
product defining Chebyshev polynomials; see §7.3. 

Notes. The reciprocals of the roots of <j> k are the r-parameters needed for 
Richardson’s method, and are general input for a modification of Algorithm 1. 

This algorithm directly computes the recursion coefficients, {sjy},~'for normalized 
polynomials; see (6.4.1). These are not the recursion coefficients, { a k , %}, for residual 
polynomials. An expression for the a k , 7 k coefficients in terms of the s,y coefficients is 
given as follows. 
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Since (£)/<^,- (0), (6.4.1) is equivalent to 


R k (0 = 


*k- i(o) 
s k,k+l < l > k (°) 


&k- 1(0 


Skk<h- 1 (°) 

s k jk+l^k (°) 


■Rk- 1(0 


2(0) 

s k,k+l ( l > k (°) 


*4-2(0 ' 


From (6.2.2), it follows that 


■S/fc.ifc + l^M 0 ) 
s k ,k+\ < t > k (®) 

For step 6.4) below, the monic polynomial, 7r t , is used to calculate the leading 

i 

coefficient of <j > i . Let the i roots of <j> i be Define 7T X (f) = II (f — p JX ). Let 


7T; 




Then 0 t (f) = 

i 

The factored form ^ t (£) = II (f — p ^ ) is recommended for evaluating <^, (£). 


1) 


Set v Q := 



Set <f> 0 := i/ 0 . 


2) Set s n •:= and se ^ Pn := 5 n> the ro °t °f • 


3) If /r = 1, return. 
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5) Set <^i(0) := V\{~P\\) • 

6) For 2 < i < A: , do: 

6.1) Set the first i — 2 elements of the last column of S^, column i, equal to 0. 

6.2) Formulas (6.4.2) and (6.4.3) give the remaining two elements. Set 


s i—l,i • ( £$i — 2> *f*i— 1 )ftu i 

s ii :== ( 1> 1 )^tu • 

6.3) Compute the eigenvalues of (the symmetric matrix) S^.. Set the roots, {p ; , }, 
of <j>i to these eigenvalues. 


6.4) 


Set i/j-: 





6.5) Set <t>i (0) := (-1)* v { p u ■ • ■ p i{ . 


6.6) Set a, 


s i,i+\ ( f > i (®) 


6.7) Set -7, 




Enddo. 
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7.2. Algorithm for the Grand-Leap Parameters. 

Algorithm, 5. (Compute S k , and the roots and leading coefficient of C k _i-) 

Purpose. Compute the a-parameters and parameter g k _ l needed for the grand-leap 
algorithm; these parameters are the roots and leading coefficient respectively of C k _±. 

Input. A matrix (styta+ixJfc+i, such as S^ t+i from Algorithm 4, nonzero parameters 
i> k , <j>Q, and <j> k (0), such as, also, from Algorithm 4, and period k . 

Output. The algorithm generates the k — 1 roots, {cr t }, and leading coefficient g k _ x 
of the “polynomial preconditioner” C k _ v These quantities become input for Algorithm 
2 . 

Restrictions. There are no restrictions other than those imposed on the input 
parameters. 

1) Initialize k >2 . 

2) Set s t j := s t J for 1 < * , j < A: will be reset in step 3).) 

3) Set s kl :=s k k+l <j> k (0)/<£„ • 

4) Set S k :=(s t y) . 

5) Compute the eigenvalues of S*.. Set the roots of C equal to the nonzero 
eigenvalues of S fc . 
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6) Set ^‘ 0) • 

7.3. The Chebyshev Case. For this special case, the grand-leap parameters may be 
determined with explicit inner products. 

7.3.1. Chebyshev Orthogonality. Assume d, c * 0, and 0 is not in the interval 
[d — c , d + c]. If d and c are real, this is equivalent to assuming d > 0, and d — \ c\ 
> 0. Let T, (fx) be the Chebyshev polynomial of degree i defined by the familiar 
recursion T 0 = l,T 1 (n)=ii, and for l<i, T i+1 (n)=2fiT i (fj.) — T i _ l (fx). 

Let [(£ — d)/c J be the shifted and translated Chebyshev polynomial. The 

0,-(£) 

Chebyshev residual polynomial is therefore — 

M°) 

orthogonality relations (where c > 0 if real) 


. The family : 0 < i } satisfies th^ 


d +C 

/ 

d-c 


J - 


7T, t 


n 


3=0 
= 3*0 


0,i *3 


where 


= 


c±-ij-di 


- 1/2 
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7.3.2. Recursions for the Shifted Chebyshev Polynomials. The recursion for T, 
yields a recursion for {xjj i }: 


£0 o =di/’o + c V , 1 (O 


and for 1 < i , 


b!>i ( 0 = y -i( 0 + A ( 0 + 1- A +1 • 


7.3.3. Roots of C jt _ 1 The roots of C*-! are among the eigenvalues of S^. An explicit 
expression for requires S k and <^ fc (0). To determine these quantities, the three term 
recursion for the normalized polynomials is needed. The normalized polynomials are: 


The recursion is 


wo 


MO 
II A || 



WO =rf< Ao(0+-^j- ^i(0 » 

WO = ^o(0 + d( t> i(0 + y MO » 


and for 2 < i , 
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(*M = f *-,(() +*+M+j ■(«)• 

7.3. 3.1. Matrix From the recursion for {^, }, it follows that 



To modify S k to obtain S^, <Ajk(0) is needed. For 1 < k, 



which follows from T k (fj,) = coshfccosh 
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7.3.3. 2. Matrix S k _ 1 . Therefore, 


s* 


d 

c 

VT 

0 


VT 

d — 

2 

^ d *- 
2 2 


(7.3.1) 


0 

^fc(O) c 

<t>Q 2 


c 

2 

0 


d — 
2 

i_ d 
2 


7.3.4. Leading Coefficient gfc_i- Since 


c*-i(0 


i-RM 


it suffices to find the leading coefficient of R * 


'PM 

^Jfe(O) ' 


The leading coefficient of V’fc(f) is easily obtained from the recursion 


V>o = 


MO = - — — M 

C 


and, for 2 < i, 
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AiO = 2 - — — i(0 - 0£- 2 (O • 

c 

\k 

12 

The leading coefficient of ^ is therefore — — . This combined with the expression 

^ 2 c 

for V’jb(O) gives 

1 

3k-l - - y 

cosh&cosh -1 — — 
c 

7.3.5. Algorithm for the Grand— Leap Parameters in the Chebyshev Case. 

Algorithm 6. ( Compute the parameters for the Grand-Leap Algorithm in the 
Chebyshev Case.) 

Purpose. Compute g^-ii t 0 , and a lf ..., cr^_i in the Chebyshev case as required for 
Algorithm 2. 

Input. Ellipse parameters d and c and period k. 

Output. The k — 1 roots, {a,- }, and leading coefficient g^-i of the polynomial 
preconditioner C k _ v 

Restrictions. If the grand-leap algorithm is to converge then the ellipse parameters must 
satisfy the same restrictions as in Algorithm 1. 




Notes. The algorithm uses a matrix S k that is not defined if c =0. In this case, the set 
of Chebyshev residual polynomials reduces to the family { R k = (f — d) k /d k : 0 < k }, 
which is not an orthogonal family for any weight function. The case c = 0 is often used 
in the Manteuffel algorithm in order to compute improved ellipse parameters adaptively. 
If one believed that c = 0 then Richardson’s method would converge in a single step if 
the matrix were normal, in which case no need exists for the grand-leap formulation. If 
one were computing ellipse parameters adaptively, then the parameter computation 
technique reduces to a sequence of matrix vector multiplications, and again the grand- 
leap formulation is not desired. For these reasons, if c is small relative to d the 
algorithm halts. The halting criterion is a comparison of | c /d | 2 to the machine epsilon, 
denoted in the algorithm by “mach eps” and defined to be the largest machine number, 
e, such that the floating point sum 1 + e equals the machine number 1. 

As a final note, there does exist an analog of S fc in the degenerate case (c = 0), and 
the algorithm control could branch to the computation of the eigenvalues of the analog 
of Sj. , but the lack of a practical need obviates this version. 


1) Set t x : 


1 

— c7r/2 +d 


2) If k = 1 or | c jd | < Vmach eps then return. 
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<M°) 


coshfccosh 1 

^( 0 ) 

1 V’fcll & 



4) Set the roots { a x , . . . , cr^-i} of C' A _ 1 equal to the nonzero eigenvalues of matrix 
(7.3.1). 


5) Set the leading coefficient f^./of C k _ x equal to 

( 

1 _ 1 _ 

2 c 

9k— l — ~ 

8. Summary. 

The leapfrog and grand-leap variants of Richardson’s method and a general second 
order method have been described. A comparison among the methods and variants 
shows that there are advantages either to omitting every other iterant or to omitting all 
iterants (except the last). 

The leapfrog and grand-leap variants require sets of parameters that may be 
computed from the eigenvalues of a matrix. In the leapfrog case, the matrix is the same 
as that which expresses the roots of a member of a family of orthogonal polynomials as 
the eigenvalues of a symmetric tridiagonal matrix. This matrix may be modified slightly 
to yield the parameters needed for the grand-leap variant. 



I 
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Algorithms for the leapfrog and grand-leap methods are given in the Chebyshev 
case. In the Chebyshev case, explicit values for the elements of the tridiagonal matrix are 
well known and need not be computed. 
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